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•• The MAILING DATE of this communication appears on the cover sheet with the correspondence address •• 
Period for Reply 

A SHORTENED STATUTORY PERIOD FOR REPLY IS SET TO EXPIRE 3 MONTH(S) FROM 
THE MAILING DATE OF THIS COMMUNICATION. 

- Extensions of time may be available under the provisions of 37 CFR 1 .136(a). In no event, however, may a reply be timely filed 
after SIX (6) MONTHS from the mailing date of this communication. 
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earned patent term adjustment See 37 CFR 1.704(b). 
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6) E3 Claim(s) 1-52 is/are rejected. 
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DETAILED ACTION 

Response to Amendment 

1 . The examiner acknowledges amended claim 22 filed on 12/29/2003. 

Response to Arguments 

2. Applicant's arguments with respect to claims 1-52 have been considered. 

3. A response to the remarks will not be given because they are moot in view of the new 
ground(s) of rejection. 

Claim Rejections - 35 USC § 103 

4. The following is a quotation of 35 U.S.C. 103(a) which forms the basis for all 

obviousness rejections set forth in this Office action: 

(a) A patent may not be obtained though the invention is not identically disclosed or described as set forth in 
section 102 of this title, if the differences between the subject matter sought to be patented and the prior art are such that 
the subject matter as a whole would have been obvious at the time the invention was made to a person having ordinary 
skill in the art to which said subject matter pertains. Patentability shall not be negatived by the manner in which the 
invention was made. 

5. Claims 1-4,7-10,13-17,20-25,28-33,36-40,43-48,51 and 52 rejected under 35 U.S.C. 
103(a) as being obvious over Monier (U.S. Patent No. 5,974,455) in view of Najork (U.S. Patent 
No. 6,301,614). 

6. The applied reference has a common inventor with the instant application. Based upon 
the earlier effective U.S. filing date of the reference, it constitutes prior art only under 35 U.S.C. 



Application/Control Number: 09/607,710 Page 3 

Art Unit: 2157 

102(e). This rejection under 35 U.S.C. 103(a) might be overcome by: (1) a showing under 37 
CFR 1 . 132 that any invention disclosed but not claimed in the reference was derived from the 
inventor of this application and is thus not an invention "by another"; (2) a showing of a date of 
invention for the claimed subject matter of the application which corresponds to subject matter 
disclosed but not claimed in the reference, prior to the effective U.S. filing date of the reference 
under 37 CFR 1. 13 1; or (3) an oath or declaration under 37 CFR 1. 130 stating that the 
application and reference are currently owned by the same party and that the inventor named in 
the application is the prior inventor under 35 U.S.C. 104, together with a terminal disclaimer in 
accordance with 37 CFR 1.321(c). For applications filed on or after November 29, 1999, this 
rejection might also be overcome by showing that the subject matter of the reference and the 
claimed invention were, at the time the invention was made, owned by the same person or 
subject to an obligation of assignment to the same person. See MPEP § 706.02(1)(1) and § 
706.02(1)(2). 

7. In reference to claims 1 , 1 3,23,3 1 ,38 and 46, Monier teaches downloading data sets from 
among a plurality of host computers comprising the following steps: 

Storing representations of data set addresses in a set of data structures, including a first 
buffer, a second buffer and a first disk file, wherein representations of data set addresses stored 
in the first disk file are ordered (column 3, lines 1-35, Monier discloses storing URL 
representations in a set of data structures, including a hash table (stored in random access 
memory (RAM)), an append buffer (stored in RAM) and a sequential disk file, wherein the 
representations are stored sequentially in the disk file). 
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Selecting as a current buffer one of the first and second buffers (column 6, lines 35-45, 
Monier discloses selecting and managing a current buffer among the hash table and append 
buffer). 

Downloading at least one data set that includes addresses of one or more referred data 
sets (column 5, lines 20-30, Monier discloses fetching web pages that include URL's of one or 
more referred web pages) . 

Identifying the addresses of the one or more referred data sets (column 5, lines 20-30, 
Monier discloses analyzing and identifying the addresses of the one or more referred web pages). 

For each identified address: 

Generating a representation of the identified address (column 5 line 55 - column 
6 line 22, Monier discloses generating a fingerprint representation of the specified URL), and 

Determining whether the representation is stored in the buffer, and when this 
determination is negative, storing the representation in the buffer (column 5 line 43 - column 6 
line 22, Monier discloses determining whether the representation is stored in the hash table, and 
when this determination is negative, storing the representation in the hash table). 

When the buffer reaches a predefined full condition: 

Ordering the contents of the buffer according to the representations (column 6, lines 1-33, 
Monier discloses ordering the contents of the hash table according to the fingerprint 
representations), and 

Monier discloses appending contents of the hash table into the contents of the disk file 
(column 6 line 22 - column 7 line 12). Monier fails to teach performing an ordered merge of the 
contents of the buffer into the contents of the first disk file. However, Najork teaches sorting an 
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index of representations and performing a sorted merge of the index with a disk file (column 3 
line 30 - column 4 line 45 and column 6 line 1 - column 7 line 25). 

It would have been obvious for one having ordinary skill in the art to perform a sorted 
merge of the hash table with the disk file as per the teachings of Najork for facilitating look-up 
operations on the disk file. 

Selecting the other buffer as the current buffer, wherein the previously current buffer is 
identified as a non-current buffer (column 6, lines 22-67, Monier discloses selecting the append 
buffer as the current buffer, wherein the hash table is identified as a non-current buffer). 

8. In reference to claims 2,14,24,32,39 and 47, Monier in view of Najork teach the method, 
the computer program and the web crawler system of claims 1,13,23,31,38 and 46 above, 
wherein after determining that the representation is not stored in the buffer, the identified address 
is stored in the buffer (column 5 line 43 - column 6 line 22, Monier discloses that after 
determining that the representation is not stored in the hash table, the identified address is stored 
in the hash table). 

9. In reference to claims 3,15,25,33,40,48, Monier in view of Najork teach the method, the 
computer program and the web crawler system of claims 1,13,23,31,38 and 46 above, wherein: 

After determining that the representation is not stored in the buffer, the identified address 
is stored in a second disk file (column 9, lines 25-40, Monier discloses after determining that the 
representation is not stored in the hash table, the identified address is stored in a second disk 
file), and 



Application/Control Number: 09/607,710 Page 6 

Art Unit: 2157 

Additionally storing with each representation in the buffer a pointer to the corresponding 
address stored in the second disk file (column 3, lines 1-20, column 5 & column 6, lines 20-53, 
Monier discloses additionally storing with each representation in the RAM a pointer to the 
corresponding address stored in the second disk file), and 

While ordering the contents of the buffer, keeping with each representation in the buffer 
its pointer to the corresponding address in the second disk file (column 5 & column 6, lines 20- 
53, Monier discloses while ordering the contents of the hash table (in RAM), keeping with each 
representation in the hash table its pointer to the corresponding address in the disk file). 
10. In reference to claims 4 and 16, Monier in view of Najork teach the method of claims 3 
and 1 5 above, wherein when the buffer reaches a predefined full condition: 

Each representation in the buffer stores an associated flag, setting the flag to a first value 
when the representation is equal to a representation previously stored in the first disk file, and 
setting the flag to a second value, when the representation is not equal to any representation 
previously stored in the first disk file (column 5 lines 25-35, & column 8 lines 45-65, Monier 
discloses each representation in the hash table stores an associated " fetched flag", setting the 
flag to a first value when the representation is equal to a representation previously stored in the 
disk file, and setting the flag to a second value, when the representation is not equal to any 
representation previously stored in the disk file), and 

Each representation whose flag is set to the second value, scheduling the corresponding 
data set for downloading (column 9, lines 25-50, Monier discloses each representation whose 
flag is set to the second value and marked as "not fetched", scheduling the corresponding data set 
for fetching). 
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11. In reference to claims 7,20,28,36,43 and 5 1 , Monier in view of Najork teach the method, 
the computer program and the web crawler system of claims 1 , 1 3,23,3 1 ,38 and 46 above, 
wherein the representation of the identified address comprises a checksum of at least a portion of 
the identified address (column 5 line 55 - column 6 line 22, Monier discloses the representation 
of the identified URL comprising a fingerprint of at least a portion of the identified URL). 

12. In reference to claims 8,21,29 and 44, Monier in view of Najork teach the method, the 
computer program and the web crawler system of claims 1,13,23 and 38 above, wherein: 
Determining whether the representation is stored in a cache before determining whether the 
representation is stored in the buffer (columns 6&7, Monier discloses determining whether the 
representation is stored in append buffer before determining whether the representation is stored 
in an input buffer (in RAM)), and 

Determining whether the representation is stored in a cache, and if positive, skipping the 
determination of whether the representation is stored in the buffer (columns 6&7, Monier 
discloses determining whether the representation is stored in an append buffer, and if positive, 
skipping the determination of whether the representation is stored in the input buffer), and 

When the representation is not stored in the cache, the cache has not reached a predefined 
full condition, and other predefined criteria are met, adding the representation to the cache 
(columns 6&7, Monier discloses when the representation is not stored in the append buffer, the 
host name table has not reached a predefined full condition, and other predefined criteria are met, 
adding the representation to the input buffer), and 

When the representation is not stored in the cache, the cache has reached said predefined 
full condition, and said other predefined criteria are met, evicting a stored representation from 
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the cache in accordance with an eviction policy and adding the representation to the cache 
(columns 6&7, Monier discloses when the representation is not stored in the append buffer, the 
append buffer has reached said predefined full condition, and said other predefined criteria are 
met, evicting a stored representation from the append buffer in accordance with an eviction 
policy and adding the representation to the append buffer). 

13. In reference to claims 9,10,17,30,37,45 and 52, Monier in view of Najork teach the 
method, the computer program and the web crawler system of claims 1 ,23,3 1 ,38 and 46 above, 
wherein when a representation in the first buffer is not found in the first disk file during merging, 
scheduling the corresponding data set for downloading (columns 6-8, Monier discloses that when 
a representation in the hash table is not found in the disk file during merging, scheduling the 
corresponding web page for fetching). 

14. Claim 22 rejected under 35 U.S.C. 103(a) as being obvious over Monier (U.S. Patent No. 
5,974,455) in view of Najork (U.S. Patent No. 6,321,265). 

1 5. The applied reference has a common inventor with the instant application. Based upon 
the earlier effective U.S. filing date of the reference, it constitutes prior art only under 35 U.S.C. 
102(e). This rejection under 35 U.S.C. 103(a) might be overcome by: (1) a showing under 37 
CFR 1 . 132 that any invention disclosed but not claimed in the reference was derived from the 
inventor of this application and is thus not an invention "by another"; (2) a showing of a date of 
invention for the claimed subject matter of the application which corresponds to subject matter 
disclosed but not claimed in the reference, prior to the effective U.S. filing date of the reference 
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under 37 CFR 1 . 1 3 1 ; or (3) an oath or declaration under 37 CFR 1.130 stating that the 
application and reference are currently owned by the same party and that the inventor named in 
the application is the prior inventor under 35 U.S.C. 104, together with a terminal disclaimer in 
accordance with 37 CFR 1.321(c). For applications filed on or after November 29, 1999, this 
rejection might also be overcome by showing that the subject matter of the reference and the 
claimed invention were, at the time the invention was made, owned by the same person or 
subject to an obligation of assignment to the same person. See MPEP § 706.02(1)(1) and § 
706.02(1)(2). 

16. Monier teaches downloading data sets from among a plurality of host computers 
comprising the following steps: 

Storing representations of data set addresses in a set of data structures, including a first 
buffer, a second buffer and a first disk file, wherein representations of data set addresses stored 
in the first disk file are ordered (column 3, lines 1-35, Monier discloses storing URL 
representations in a set of data structures, including a hash table (stored in random access 
memory (RAM)), an append buffer (stored in RAM) and a sequential disk file, wherein the 
representations are stored sequentially in the disk file). 

Selecting as a current buffer one of the first and second buffers (column 6, lines 35-45, 
Monier discloses selecting and managing a current buffer among the hash table and append 
buffer). 

Downloading at least one data set that includes addresses of one or more referred data 
sets (column 5, lines 20-30, Monier discloses fetching web pages that include URL's of one or 
more referred web pages). 
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Identifying the addresses of the one or more referred data sets (column 5, lines 20-30, 
Monier discloses analyzing and identifying the addresses of the one or more referred web pages). 

Generating a representation of the identified address (column 5 line 55 - column 6 line 
22, Monier discloses generating a fingerprint representation of the specified URL), and 

Monier discloses determining whether the representation is stored in the hash table 
(column 5 line 43 - column 6 line 22). Monier fails to teach whether the disk file is empty, and 
when the representation is not stored in the buffer and the disk file is empty, scheduling the 
corresponding data set for downloading. However, Najork teaches determining if a queue is 
empty and if it is empty then downloading data set addresses to the queue (column 3 line 1 - 
column 4 line 5). 

It would have been obvious for one of ordinary skill in the art to download the data set 
corresponding to the representations in the hash table to the disk file if the disk file is empty as 
per the teachings of Najork so that new URLs can be stored as they are processed. 

Monier discloses determining if the representations have been previously stored in the 
hash table/disk file (columns 8&9). Monier fails to teach when the representation is not stored in 
the buffer and the disk file is not empty, storing the representation in the buffer and delaying 
scheduling of the corresponding data set for downloading until it is determined that the 
representation has not been previously stored in the disk file. However, Najork teaches 
determining if the queue is not empty then delaying and assigning a download time for the data 
set addresses (column 3 line 1 - column 4 line 5). 

It would have been obvious for one ordinarily skilled in the art to assign a download time 
for the data set addresses as per the teachings of Najork that would allow sufficient time to 
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determine if the representations are stored in the disk file as per the teachings of Monier so that 
duplicate representations will not be stored. 

17. Claims 5,6,1 1,12,18,19,26,27,34,35,31,32,49 and 50 are rejected under 35 U.S.C. 103(a) 
as being unpatentable over Monier (U.S. Patent No. 5,974,455) in view of Najork (U.S. Patent 
No. 6*301,614) in further view of Cabrera et al. (U.S. Patent No. 5,953,729). 

18. In reference to claims 5,11,1 8,26,34,4 1 and 49, Monier teaches the method, the computer 
program and the web crawler system of claims 1,13,23,31,38 and 46 above. 

Monier does not teach storing representations of data set addresses in a sparse disk file 
which is divided into portions (or sub-files), each portion having a starting address and contents 
comprising an ordered list of representations of data addresses. However, Cabrera teaches sparse 
file technology divided into clusters each having a cluster number (column 9, lines 40-66). 

It would have been obvious to one having ordinary skill in the art to modify Monier by 
storing URL representations in a sparse file as per the teachings of Cabrera so as to minimize the 
overhead in managing and ordering the contents on the disk file. 

19. Monier does not teach merging the contents of the buffer with the ordered contents of the 
sparse disk file to include determining a starting address for a corresponding portion of the 
sparse disk file. However, Cabrera teaches sparse file technology which can indicate starting 
cluster numbers for portions of the sparse file (columns 9&10). 

It would have been obvious to one having ordinary skill in the art to modify Monier by 
when merging the contents of the hash table with the ordered contents of the sparse file, to 
include determining a starting cluster number for a corresponding portion of the sparse disk file 
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as per the teachings of Cabrera so as to minimize the overhead for merging and ordering of the 
contents on the disk file. 

20. Monier does not teach merging the contents of the buffer with the ordered contents of the 
sparse disk file to include performing an ordered merge of a subset of the buffer, starting at the 
representation for which the starting address was obtained, into the contents of the corresponding 
portion. However, Cabrera teaches sparse fiie technology which can indicate starting cluster 
numbers for portions of the sparse file (columns 9&10). 

It would have been obvious to one having ordinary skill in the art to modify Monier by 
when merging the contents of the hash table with the ordered contents of the sparse disk file to 
include performing an ordered merge of a subset of the hash table, starting at the representation 
for which the starting address was obtained, into the contents of the corresponding portion as per 
the teachings of Cabrera so as to minimize the overhead in merging and ordering the contents on 
the disk file. 

21. In reference to claims 6,12,19,27,35,42 and 50, Monier teaches the method, the computer 
program and the web crawler system of claims 1 ,13,23,3 1,38 and 46 above. 

22. Monier does not teach storing representations of data set addresses in a sparse disk file 
having empty entries interspersed among entries storing said representations. However, Cabrera 
teaches sparse file technology which comprises a mixture of zero data and non-zero data (column 
7, lines 20-50). 

It would have been obvious to one having ordinary skill in the art to modify Monier by 
storing representations of data set addresses in a sparse disk file having zero data interspersed 
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among data of said representations as per the teachings of Cabrera so as to minimize the 
overhead in sequentially ordering the data contents on the disk file. 

23. Monier teaches sequentially scanning the disk file via an input buffer, starting at the 
representation for which a starting address was obtained, until a representation matching the 
respective representation is found (column 6 lines 35-67 & column 9 lines 25-50). Monier does 
not teach scanning the disk file until one of the empty entries is found, and when an empty entry 
is found storing the respective representation in the empty entry. However, Cabrera teaches 
sparse file technology which comprises a mixture of zero data and non-zero data (column 7, lines 
20-50). 

It would have been obvious to one having ordinary skill in the art to modify Monier by 
scanning the disk file until one of the zero data entries is found as per the teachings of Cabrera, 
and when zero data entry is found storing the respective representation in the zero data entry, so 
as to minimize the overhead of ordering the data contents on the disk file while merging the 
contents of the hash table with the contents of the disk file. 

Any inquiry concerning this communication or earlier communications from the 
examiner should be directed to Ramy M Osman whose telephone number is (703) 305-8050. 
The examiner can normally be reached on Monday through Friday 9AM to 5PM. 

If attempts to reach the examiner by telephone are unsuccessful, the examiner's 
supervisor, Ario Etienne can be reached on (703) 305-7562. The fax phone number for the 
organization where this application or proceeding is assigned is 703-872-9306. 



Application/Control Number: 09/607,710 
Art Unit: 2157 



Page 14 



Information regarding the status of an application may be obtained from the Patent 
Application Information Retrieval (PAIR) system Status information for published applications 
may be obtained from either Private PAIR or Public PAIR. Status information for unpublished 
applications is available through Private PAIR only. For more information about the PAIR 
system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR 
system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). 

RMO 

February 19, 2004 
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